Skip to content

Bump operator-sdk to v1.42.1 in operator and storage-operator#4818

Merged
bert-e merged 9 commits intodevelopment/133.0from
improvement/bump-operator-sdk-v1.42.1
Mar 26, 2026
Merged

Bump operator-sdk to v1.42.1 in operator and storage-operator#4818
bert-e merged 9 commits intodevelopment/133.0from
improvement/bump-operator-sdk-v1.42.1

Conversation

@ezekiel-alexrod
Copy link
Copy Markdown
Contributor

@ezekiel-alexrod ezekiel-alexrod commented Mar 16, 2026

Summary

  • Bump operator-sdk from v1.37.x to v1.42.1 in both operator and storage-operator, using the upgrade tool from tools: add script to upgrade operator-sdk #4816
  • Replace panic(fmt.Errorf(...)) with structured logr.Error + os.Exit(1) in ClusterConfig controller
  • Add rollout restart of operator deployment after ingress tests to avoid stale CrashLoopBackOff

Changes

operator-sdk v1.37.xv1.42.1 (operator & storage-operator)

Upgrade performed using the config-driven tool from #4816:

python3 tools/upgrade-operator-sdk/upgrade.py --operator-dir operator tools/upgrade-operator-sdk/operator
python3 tools/upgrade-operator-sdk/upgrade.py --operator-dir storage-operator tools/upgrade-operator-sdk/storage-operator

Incremental changes for each version, applied all at once using the tool:

  • v1.38.0 — Remove kube-rbac-proxy; expose metrics via native HTTPS endpoint; update golangci-lint to v1.59.1, controller-tools to v0.15.0, kustomize to v5.4.2
  • v1.39.0 — Upgrade k8s.io dependencies to v0.31.14 and controller-runtime to v0.19.5; update kustomize to v5.4.3 and controller-tools to v0.16.1; add network-policy scaffolding
  • v1.40.0 — Upgrade Go to 1.23.12 and k8s.io dependencies to v0.32.12 with controller-runtime v0.20.4; add metrics certificate watcher support; replace static ENVTEST versions with dynamic go list
  • v1.41.0 — Upgrade Go to 1.24.13 and k8s.io dependencies to v0.33.8 with controller-runtime v0.21.0; migrate golangci-lint to v2 config format; update controller-tools to v0.18.0; add Kind cluster targets for e2e tests
  • v1.42.1 — Upgrade Go to 1.25.8 and k8s.io dependencies to v0.33.9; scaffold generates COPY internal/ instead of COPY internal/controller/

After all version bumps, configuration files for both operators (config/, cmd/main.go, Makefile, Dockerfile, README.md, etc.) were realigned with the upstream scaffold. New scaffold additions include test/e2e/, .github/workflows/, and config/rbac/*_admin_role.yaml.

Fix: replace panic with structured log + os.Exit(1) in ClusterConfig controller

In operator/pkg/controller/clusterconfig/controller.go, panic(fmt.Errorf(...)) when the main ClusterConfig is unexpectedly deleted has been replaced with logr.Error + os.Exit(1). This produces a clean log entry and still triggers a pod restart, avoiding an uncontrolled panic and its confusing stack trace.

Fix: rollout restart of operator after ingress tests

A new autouse module-scoped pytest fixture in tests/post/steps/test_ingress.py performs a kubectl rollout restart of the operator deployment at the end of the ingress test module. Ingress tests heavily modify the ClusterConfig, which can leave the operator in CrashLoopBackOff with growing backoff delays; the restart ensures a fresh pod before subsequent test modules.


Closes: MK8S-110, MK8S-111

@ezekiel-alexrod ezekiel-alexrod requested a review from a team as a code owner March 16, 2026 15:47
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Mar 16, 2026

Hello ezekiel-alexrod,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Mar 16, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Peer approvals must include at least 1 approval from the following list:

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Mar 16, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Peer approvals must include at least 1 approval from the following list:

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Mar 17, 2026

Conflict

There is a conflict between your branch improvement/bump-operator-sdk-v1.42.1 and the
destination branch development/133.0.

Please resolve the conflict on the feature branch (improvement/bump-operator-sdk-v1.42.1).

git fetch && \
git checkout origin/improvement/bump-operator-sdk-v1.42.1 && \
git merge origin/development/133.0

Resolve merge conflicts and commit

git push origin HEAD:improvement/bump-operator-sdk-v1.42.1

@ezekiel-alexrod ezekiel-alexrod force-pushed the improvement/bump-operator-sdk-v1.42.1 branch from 4572b35 to 0b47fdb Compare March 18, 2026 17:40
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Mar 18, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Peer approvals must include at least 1 approval from the following list:

@ezekiel-alexrod ezekiel-alexrod force-pushed the improvement/bump-operator-sdk-v1.42.1 branch from 0b47fdb to c48e699 Compare March 19, 2026 16:21
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Mar 19, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Peer approvals must include at least 1 approval from the following list:

@ezekiel-alexrod ezekiel-alexrod force-pushed the improvement/bump-operator-sdk-v1.42.1 branch 2 times, most recently from 0d65a14 to 9562fe6 Compare March 20, 2026 11:49
python3 tools/upgrade-operator-sdk/upgrade.py --operator-dir operator tools/upgrade-operator-sdk/operator

Scaffold realigned with operator-sdk v1.42.1, Go bumped to 1.24.13,
k8s.io dependencies pinned to v0.33.10.
python3 tools/upgrade-operator-sdk/upgrade.py --operator-dir storage-operator tools/upgrade-operator-sdk/storage-operator

Scaffold realigned with operator-sdk v1.42.1, Go bumped to 1.24.13,
k8s.io dependencies pinned to v0.33.10.
@ezekiel-alexrod ezekiel-alexrod force-pushed the improvement/bump-operator-sdk-v1.42.1 branch from 9562fe6 to bfd32fe Compare March 20, 2026 13:59
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Mar 23, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Peer approvals must include at least 1 approval from the following list:

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026

  • operator/cmd/main.go:172 and storage-operator/cmd/main.go:172: Metrics cert watcher error log has wrong message ("to initialize..." instead of "Failed to initialize...") and passes err twice (as first arg to Error() and as "error", err key-value pair), producing duplicate error in structured logs. - Fix: setupLog.Error(err, "Failed to initialize metrics certificate watcher")- CHANGELOG.md:63: Go version is inconsistent across the PR — CHANGELOG says 1.24.13, PR description says 1.25.8, go.mod has 1.24.0, and Dockerfiles use golang:1.24. Align all references to the actual intended version.

    Review by Claude Code

@@ -0,0 +1,329 @@
/*
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The e2e test files (test/e2e/ and test/utils/) are scaffolded boilerplate that will never run in MetalK8s CI (they require Kind, make docker-build, etc.). Including them adds dead code to the repo. Consider deleting these scaffold-only test directories if they are not intended for use, or add a note in the PR about plans to integrate them.

— Claude Code

))
// so if, for whatever reason, this one get deleted we log the error
// and exit so that the operator restarts and re-creates the ClusterConfig
reqLogger.Error(nil, "ClusterConfig object should not be deleted, exiting")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions replacing panic(fmt.Errorf(...)) with logr.Error + os.Exit(1), but this code now passes a nil error to reqLogger.Error(). While logr allows nil errors, this loses the descriptive message that the old panic had. Consider creating an actual error to include context about why the exit is happening, e.g. reqLogger.Error(fmt.Errorf("main ClusterConfig object unexpectedly deleted"), "exiting").

— Claude Code

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026

  • Scaffolded e2e test files (operator/test/e2e/, operator/test/utils/) are dead code that will not run in MetalK8s CI
    - Consider removing them or documenting plans to integrate them
    - reqLogger.Error(nil, ...) in clusterconfig controller passes nil error, losing context compared to the old panic(fmt.Errorf(...))
    - Pass an actual error value so structured logs capture the reason for the exit

    Review by Claude Code

@ezekiel-alexrod ezekiel-alexrod force-pushed the improvement/bump-operator-sdk-v1.42.1 branch from 1a54a1c to de52219 Compare March 24, 2026 17:39
@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026

  • Broken error log message in both operator/cmd/main.go:172 and storage-operator/cmd/main.go:172: message says "to initialize metrics certificate watcher" instead of "Failed to initialize metrics certificate watcher", and err is passed twice (as the error arg and as a kv pair), duplicating error info in logs.
    - Fix: setupLog.Error(err, "Failed to initialize metrics certificate watcher")

    Review by Claude Code

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Mar 25, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Peer approvals must include at least 1 approval from the following list:

@ezekiel-alexrod
Copy link
Copy Markdown
Contributor Author

/approve

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Mar 26, 2026

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

  • ✔️ development/133.0

The following branches will NOT be impacted:

  • development/123.0
  • development/124.0
  • development/124.1
  • development/125.0
  • development/126.0
  • development/127.0
  • development/128.0
  • development/129.0
  • development/130.0
  • development/131.0
  • development/132.0
  • development/2.0
  • development/2.1
  • development/2.10
  • development/2.11
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6
  • development/2.7
  • development/2.8
  • development/2.9

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

  • Any commit you add on the source branch will trigger a new cycle after the
    current queue is merged.
  • Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Mar 26, 2026

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/133.0

The following branches have NOT changed:

  • development/123.0
  • development/124.0
  • development/124.1
  • development/125.0
  • development/126.0
  • development/127.0
  • development/128.0
  • development/129.0
  • development/130.0
  • development/131.0
  • development/132.0
  • development/2.0
  • development/2.1
  • development/2.10
  • development/2.11
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6
  • development/2.7
  • development/2.8
  • development/2.9

Please check the status of the associated issue None.

Goodbye ezekiel-alexrod.

@bert-e bert-e merged commit cecaff4 into development/133.0 Mar 26, 2026
31 checks passed
@bert-e bert-e deleted the improvement/bump-operator-sdk-v1.42.1 branch March 26, 2026 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants